Towards a Bayesian Perspective on Statistical Disclosure Limitation

نویسنده

  • LAWRENCE H. COX
چکیده

National statistical offices and other organizations collect data on individual subjects (persons, businesses, organizations), while typically assuring the subject that data pertaining to them will be held confidential. These data provide the raw material for the statistical data products (tabular summaries, microdata files comprised of data records pertaining to individual subjects, and, potentially, public statistical data bases and statistical query systems) that the statistical office disseminates to multiple, broad user communities. Statistical disclosure limitation (SDL) refers to the problem and methods for thwarting re-identification of a subject and divulging the subject’s confidential data through analysis or manipulation of disseminated data products. SDL methods abbreviate or modify the data product sufficiently to thwart disclosure. SDL problems are typically computationally demanding; several have been shown to be NP-hard. Many SDL methods draw upon statistical, mathematical or optimization theory, but at the same time heuristic and partial approaches abound. Contributions from Bayesian and likelihood perspectives are increasing. Nevertheless, a strong theoretical connection between definitions of statistical disclosure, measurement of disclosure risk, and evaluation of SDL methods is lacking. This suggests opportunities for Bayesian, likelihood and hierarchical approaches. Selected opportunities and associated SDL methodological issues are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing the Risk of Disclosure of Confidential Categorical Data

Disclosure limitation involves the application of statistical tools to limit the identification of information on individuals (and enterprises) included as part of statistical data bases such as censuses and sample surveys. We outline the major issues involved in assessing disclosure risk and assuring the protection of confidentiality for data bases, especially those in the form of multi-way co...

متن کامل

Privacy and Statistical Risk: Formalisms and Minimax Bounds

We explore and compare a variety of definitions for privacy and disclosure limitation in statistical estimation and data analysis, including (approximate) differential privacy, testingbased definitions of privacy, and posterior guarantees on disclosure risk. We give equivalence results between the definitions, shedding light on the relationships between different formalisms for privacy. We also...

متن کامل

Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets

Many national statistical agencies release data to the public that have been altered to protect the confidentiality of data subjects’ identities and sensitive attributes. Unfortunately, for methods of disclosure limitation in practice, it is typically impossible for analysts to gauge how the disclosure limitation has compromised the quality of inferences from the altered data alone. This is par...

متن کامل

Additive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: A simulation study

This paper focuses on a combination of two disclosure limitation techniques, additive noise and multiplicative bias, and studies their efficacy in protecting confidentiality of continuous microdata. A Bayesian intruder model is extensively simulated in order to assess the performance of these disclosure limitation techniques as a function of key parameters like the variability amongst profiles ...

متن کامل

Intruder Testing on the 2011 UK Census: Providing Practical Evidence for Disclosure Protection

With the recent push towards sharing greater amounts of information, the pressure is on National Statistical Institutes (NSIs) to publish more detailed datasets to broader audiences. It is of parallel importance for any such organisation to respect and protect the confidentiality of respondents’ data. Assessing the risk of identification in a dataset is a challenging task and there is much in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001